api-reference-examples/python-notebook/Getting Started with Sharing.ipynb (274 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started with ThreatExchange Sharing \n",
"\n",
"**Purpose**\n",
" \n",
"The ThreatExchange APIs are designed to make the sharing of indicators, and the connections between them, simple. Additionally, the APIs provide flexible options for deciding whom you share with: yourself, individual members, groups, and everyone!\n",
"\n",
"**What you need**\n",
"\n",
"Before getting started, you'll need a few things installed and some data. \n",
"\n",
" - [Pytx](https://pytx.readthedocs.org/en/latest/installation.html) for ThreatExchange access\n",
" - [Pandas](http://pandas.pydata.org/) for data manipulation and analysis\n",
" - A CSV file with data suitable for sharing\n",
" \n",
"All of the python packages mentioned below can easily be installed via \n",
"\n",
"```\n",
"pip install <package_name>\n",
"```\n",
"\n",
"### Setup a ThreatExchange `access_token`\n",
"\n",
"If you don't already have an `access_token` for your app, use the [Facebook Access Token Tool]( https://developers.facebook.com/tools/accesstoken/) to get one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pytx.access_token import access_token\n",
"\n",
"# Specify the location of your token via one of several ways:\n",
"# https://pytx.readthedocs.org/en/latest/pytx.access_token.html\n",
"access_token()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Optionally, enable debug level logging"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from pytx.logger import setup_logger\n",
"\n",
"# Uncomment this, if you want debug logging enabled\n",
"# setup_logger(log_file=\"pytx.log\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure Privacy Settings\n",
"\n",
"This will configure the API defaults for when you share data. There are [multiple levels of privacy](https://developers.facebook.com/docs/threat-exchange/reference/privacy/) to choose from. \n",
"\n",
"The code below will publish data to a whitelist that only your appID can see, for convenient testing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pytx.access_token import get_app_id\n",
"from pytx.vocabulary import PrivacyType as pt\n",
"\n",
"# Choose the privacy level from \n",
"# https://pytx.readthedocs.org/en/latest/pytx.vocabulary.html#pytx.vocabulary.PrivacyType\n",
"privacy_type = pt.HAS_WHITELIST \n",
"\n",
"# Populate this with strings of app IDs or privacy groups. If using pt.VISIBLE, set to None\n",
"privacy_members=[str(get_app_id())] # Will also take other member or privacy group IDs as strings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define default fields for sharing\n",
"\n",
"Sometimes, your CSV data is a raw list of IPs or domains. Use this map to set default fields on the descriptors that are created. Don't worry though, if your data *does* have any of the defaults you've defined, we won't clobber it.\n",
"\n",
"In this example, our defaults are set for sharing manually curated data of malicious IP addresses from a botnet."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from pytx.vocabulary import Attack as a\n",
"from pytx.vocabulary import ReviewStatus as rs\n",
"from pytx.vocabulary import Severity as s\n",
"from pytx.vocabulary import ShareLevel as sl\n",
"from pytx.vocabulary import Status as st\n",
"from pytx.vocabulary import ThreatDescriptor as td\n",
"from pytx.vocabulary import ThreatType as tt\n",
"from pytx.vocabulary import Types as t\n",
"\n",
"# See: https://pytx.readthedocs.org/en/latest/pytx.vocabulary.html#pytx.vocabulary.ThreatDescriptor\n",
"default_fields = {\n",
" #td.ATTACK_TYPE: a.MALWARE, # TODO uncomment when PR #120 gets added to Pytx in pip\n",
" td.CONFIDENCE: 75,\n",
" #td.EXPIRED_ON: '2016-02-25 00:00:00+0000',\n",
" td.PRIVACY_TYPE: privacy_type,\n",
" td.REVIEW_STATUS: rs.REVIEWED_MANUALLY,\n",
" td.SHARE_LEVEL: sl.AMBER,\n",
" td.SEVERITY: s.SEVERE,\n",
" td.STATUS: st.MALICIOUS,\n",
" td.THREAT_TYPE: tt.MALICIOUS_IP,\n",
" td.TYPE: t.IP_ADDRESS,\n",
" td.DESCRIPTION: '[example][tags] Test description'\n",
"}\n",
"\n",
"# Add in privacy members, as needed\n",
"if privacy_members is not None:\n",
" default_fields[td.PRIVACY_MEMBERS] = ','.join(privacy_members)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Share data from a file\n",
"\n",
"Grabs the data from a local CSV file and publishes it to ThreatExchange. We interpret the columns in \n",
"the data according to [Pytx's Vocabulary](https://github.com/facebook/ThreatExchange/blob/master/pytx/pytx/vocabulary.py)\n",
"\n",
"**At a minimum**, your CSV file should have one column, named `indicator`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import csv\n",
"import pytx.errors\n",
"from pytx import ThreatDescriptor\n",
"\n",
"# The file to upload\n",
"file = 'test_share.csv'\n",
"\n",
"# Load the CSV and serially publish it\n",
"ind_count = 0\n",
"fail_count = 0\n",
"with open(file, 'rb') as csvfile:\n",
" reader = csv.DictReader(csvfile, delimiter=',', quotechar='\"')\n",
" for row in reader:\n",
" try:\n",
" fields = default_fields.copy()\n",
" fields.update(row)\n",
" result = ThreatDescriptor.new(params=fields)\n",
" except Exception, e:\n",
" print 'Unable to upload' + row['indicator'] + 'due to ' + result['message'] + \"\\n\"\n",
" fail_count = fail_count + 1\n",
" else:\n",
" ind_count = ind_count + 1\n",
"print \"Done publishing %d indicators with %d failures!\" % (ind_count, fail_count)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Confirm your data was shared\n",
"\n",
"Now, we do a quick search to confirm the data was published correctly to ThreatExchange."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from datetime import datetime, timedelta\n",
"from time import strftime\n",
"import pandas as pd\n",
"from pytx import ThreatDescriptor\n",
"from pytx.vocabulary import ThreatExchange as te\n",
"\n",
"# Define your search string and other params, see \n",
"# https://pytx.readthedocs.org/en/latest/pytx.common.html#pytx.common.Common.objects\n",
"# for the full list of options\n",
"results = ThreatDescriptor.objects(\n",
" fields=ThreatDescriptor._default_fields,\n",
" limit=search_params[te.LIMIT],\n",
" owner=str(get_app_id()),\n",
" since=strftime('%Y-%m-%d %H:%m:%S +0000', (datetime.utcnow() + timedelta(hours=(-1))).timetuple()), \n",
" until=strftime('%Y-%m-%d %H:%m:%S +0000', datetime.utcnow().timetuple())\n",
")\n",
"\n",
"data_frame = pd.DataFrame([result.to_dict() for result in results])\n",
"data_frame.head(n=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Excellent, we've shared data!\n",
"\n",
"Now that we've walked through a simple example, try out the following exercises:\n",
"\n",
" - Share a list of malicious URLs with multiple members\n",
" - Share a list of malicious domain names with a privacy group"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Put your Python code here!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}